Automated generation of consistent annual maximum NDVI on coal bases with a new algorithm

Coal is one of the most important fossil energy sources and is ensuring global energy security. Annual maximum NDVI (Normalized Difference Vegetation Index) data is an important indicator for the research in balancing coal mining and vegetation conservation. However, the existing annual maximum NDVI data displayed lower values with temporally inconsistent and a noticeable mosaic line. Here we propose an algorithm for automatically generating the annual maximum NDVI of China’s coal bases in Google Earth Engine called: Auto-NDVIcb. The accuracy of the Auto-NDVIcb algorithm has been verified with an average RMSE of 0.087 for the 14 coal bases from 2013 to 2022. Based on the proposed Auto-NDVIcb algorithm, an annual maximum NDVI dataset for all 14 coal bases in China from 2013 to 2022 was publicly released. This dataset can be fast and automatically updated online. Hence, the public dataset will continuously serve to monitor the vegetation change induced by coal mining, exploring the mechanism of vegetation degradation, and providing scientific data for developing vegetation protection policies in coal mines.


Background & Summary
Energy and the environment are at the centre of the world's current concerns.As one of the most important fossil energy sources, coal has been playing a crucial role in ensuring global energy security.However, coal exploitation has led to a range of ecological and environmental challenges, in particular, the degradation of vegetation, which is getting more and more attention from the major coal-producing countries of the world, such as America, Australia, and China [1][2][3][4] .
Normalized Difference Vegetation Index (NDVI) is a popular and effective indicator for characterizing the growth and coverage of vegetation.Since it was proposed in the 1970s 5 , numerous researchers have successfully used NDVI to characterize vegetation status in terms of spatial and temporal change [6][7][8][9][10][11] .For example, Gao et al. 8 used NDVI as an indicator for statistical regression to analyse vegetation changes at different elevations over the past 30 years.Yang et al. 12,13 used the NDVI time series and the shape-based clustering method for monitoring vegetation change in a surface coal mining area.Li et al. 14 used NDVI data to investigate the vegetation cover to reflect the disturbance and restoration of the opencast mining areas.Therefore, high-quality NDVI data are urgently needed to monitor vegetation changes in coal bases.It can not only improve the knowledge of ecological changes in the mining area but also guide and optimize production and ecological restoration activities in the mining area.Due to mining and restoration activities, time-series change of vegetation at the object-scale in mining areas is usually temporally abrupt.Moreover, the inter-annual variation in phenology also influences the NDVI in a similar season every year.The annual maximum NDVI data better reflects the real growth of the vegetation.
There are two main methods for acquiring NDVI data.The first method is to retrieve NDVI data directly from publicly available NDVI products.The second method is to perform the own band math to generate NDVI by the users.
(1) Using publicly available NDVI products.Some NDVI products based on different satellite sensors have been produced by various institutions or experts over the decades.The primary NDVI products are listed in Table 1, involved with satellites/sensors such as the Advanced Very High-Resolution Radiometer (AVHRR) [15][16][17] , SPOT/VEGETATION, Moderate Resolution Imaging Spectroradiometer (MODIS) 18,19 , Visible Infrared Imaging Radiometer Suite (VIIRS) 20,21 , GF series 22 , Landsat series 23 , and Sentinel series 24 .
The spatial resolution of existing NDVI products varies from 10 km (e.g., GIMMS NDVI) to 1 km-250 m (e.g., MODIS NDVI), 30 m (Landsat NDVI), and even 10 m (Sentinel-2 NDVI).Products such as "GIMMS NDVI From AVHRR Sensors", "NOAA CDR AVHRR NDVI", and "MODIS Terra Daily NDVI" offer high temporal resolution but low spatial resolution, making them unsuitable for small-scale studies.Low-resolution NDVI images may not capture fine vegetation features within mining areas, such as small vegetation cover or unevenly distributed vegetation.Additionally, the presence of multiple surface cover types around the coal base, including bare soil, ore stockpiles, vegetation, etc., can lead to the mixing of these diverse surfaces in a single image element.The accuracy of vegetation analysis results may be compromised.Conversely, "China 10 m Year-by-Year NDVI Maximum Dataset (2016-2021)" and "MuSyQ GF-series 16 m/10-day NDVI Vegetation Index Dataset (2018-2020)" 25 have a higher spatial resolution, but only provide data for recent several years, which cannot satisfy the long-time series research.Landsat satellites combine the advantages of both temporal and spatial resolution.Landsat data have great potential for exploitation as they provide a wealth of free time-series public historical imagery with a spatial resolution of 30 m [26][27][28] .Consequently, several NDVI data products have been developed based on Landsat 29 .For example, the "China annual maximum NDVI dataset (30 m,1986-2021)" relies on Landsat top of atmosphere (TOA) data as its source, computing NDVI throughout the year and selecting the maximum value to mosaic 30 .The "China annual maximum NDVI dataset (30 m, 2000-2020)" presented by Dong 31 was generated by extracting NDVI values from valid Landsat surface reflectance (SR) observations throughout the year.Both linear interpolation and the Savitzky-Golay (SG) smoothing filter methods were used in this process 32 .However, there are some minor defects with these products in China's coal bases.First, when using TOA data to calculate NDVI, the results are impacted by the atmosphere 33 .
Secondly, the cloud cover would cause images missing during the most vigorous period of vegetation growth [34][35][36] .It is not appropriate for existing products to directly use data from other times to fill in the gaps.Furthermore, the problem of inconsistent time-phases for different images of Landsat are often ignored.This may cause notable mosaic lines in the NDVI data, as well as the annual maximum NDVI that falls considerably below the real value.(2) User-computed NDVI: Traditionally, users can compute NDVI through band math utilizing local software.
Nevertheless, this approach is often time-consuming, especially when handling a large number of NDVI images.In contrast, Google Earth Engine (GEE) has become a leading platform for producing extensive time series of NDVI data [37][38][39][40] .For instance, Chen et al. 41 presented the "Gap Filling and Savitzky-Golay filtering (GF-SG)" approach that effectively reconstructs 8-day NDVI time-series data using MODIS and Landsat data within GEE.Furthermore, GEE provides the qualityMosaic (QM) method for extracting the maximum value from image collections.However, it addresses missing pixels by incorporating data from other times.In summary, due to satellite revisiting capability and cloud cover, there is missing data during the most vigorous vegetation growth period.The fusion of NDVI images with inconsistent imaging time causes some problems as follows.(1) The extracted annual maximum values of NDVI are significantly lower than the real values.(2) Visually, the NDVI image displayed a noticeable mosaic line.
In this paper, we proposed a GEE-based algorithm for automatically calculating the temporally-consistent annual maximum NDVI of China's coal bases (Auto-NDVI cb ).Auto-NDVI cb is based on Landsat SR data and achieves online fast generation of seamless annual maximum NDVI data with long-time series, spatial continuity, and temporal consistency.High-quality vegetation monitoring data of coal bases provides solid data support for ecological environment monitoring in mining areas.The main contributions of this paper can be summarized below: (1) Auto-NDVI cb solved the problem of the mosaic line caused by the lack of images during the vigorous vegetation growth and inconsistency in the acquiring time of nearby images.And is capable of producing temporally-consistent annual maximum NDVI of China's coal bases.(2) The annual maximum NDVI dataset of 14 coal bases in China from 2013 to 2022 42 is publicly available, which is closer to the reality of surface vegetation rather than just visual spatial smoothness.Compared with the existing annual maximum NDVI products and the current method, it shows better results at the scale of coal bases.(3) Auto-NDVI cb can be fully automated in GEE, so the public dataset can be fast and automatically updated online in a continuous way based on the proposed Auto-NDVI cb algorithm.
Remotely sensed images.The remotely sensed images utilized in this study were the "USGS Landsat 8 Level 2, Collection 2, Tier 1" 43 .The remotely sensed images utilized in this study were the "USGS Landsat 8 the overview of the auto-NDVI cb algorithm.Figure 2 illustrates the procedure of the proposed Auto-NDVI cb algorithm, which can be conducted totally in GEE.First, images effectively observed throughout the whole year are filtered, and NDVI calculations are performed on these images.Second, an automated selecting function has been developed to choose the reference NDVI image for each coal base, which most closely characterizes the vegetation growth at its peak.Third, a calibration model is constructed, ensuring consistency in the NDVI cumulative distribution function across similar land cover.Finally, all the collected NDVI images covering the coal base are batch-calibrated based on the constructed model and fused to obtain the annual maximum NDVI data.

Calculation of time-series NDVI.
The time-series NDVI images were computed in GEE.Here is the description of the steps involved.First, the Landsat 8 SR dataset was loaded into GEE.Images falling within a designated period and specific geographical region were selected for further processing.Then, date filtering was applied to select images during the most vigorous vegetation growth period in the study area.For most coal bases, the date filtering parameter was set from April to October of each year to avoid extreme phenology.Given the limitations of cloud cover and Landsat revisit cycles, focusing only on summer season images would result in vacant pixel values in the NDVI results.An exception was made for the YG coal base.Frequent cloud coverage at the YG coal base throughout the year results in less valid data, so the date filtering parameter was set from January to December, thus ensuring the completeness of data coverage.The boundary of each coal base was used as a parameter for spatial filtering.Third, the cloud and its shadow were removed for each image after filtering.The cloud removal process for Landsat SR images in GEE focuses on clouds and cloud shadows 45 .The Quality Assessment (QA) band, which stores quality information for each image, was utilized.Pixels with quality information indicating "Cloud" and "Cloud Shadow" were filtered out using bitwise operations, creating a cloud mask.The "updateMask" function in GEE was then used to remove the cloud mask area and obtain cloud-free images 46 .There may be unscreened clouds (e.g., cirrus clouds) after the filtering.However, in our study, we did not find the impact of cirrus clouds on the results.Even if cirrus cloud pixels exist, the Auto-NDVI cb algorithm will exclude them during the calculation of the maximum NDVI value.Finally, NDVI was computed for each image using the Eq. ( 1).
where NIR represents the surface reflectance in the near-infrared band and RED represents the surface reflectance in the red band.
Automatic selection of the reference NDVI image.The reference NDVI image for each coal base needs to most closely characterize the most vigorous vegetation growth in a year.Hence, the reference NDVI image is defined as the NDVI image with the highest average value among all the NDVI images of the coal base in a year.We designed an algorithm to automatically select the reference NDVI image from the NDVI collection in GEE.First, each coal base is covered by several scenes, and each scene covers an area of varying size.To ensure the sufficient coverage area of the reference NDVI image, 2-3 scenes were designated as selectable scenes at each coal base.For example, the JB coal base is covered by scenes of numbers 126032, 125032, 126033, 125033, 126034, and 125034 (Fig. 3).According to the coverage size, scene 126033 (Fig. 3(b)) and scene 125033 (Fig. 3(e)) were designated as selectable scenes at the JB coal base.Second, the images in the selectable scenes with over 40% clouds were removed to avoid severe missing pixels in the reference NDVI image after cloud removal.Finally, we calculate the average value of each NDVI image in the selectable scenes.The NDVI image with the highest average value is selected as the reference NDVI image.
Construction of the calibration model.Tobler's First Law of Geography: All things are related, but nearby things are more related than distant things 42 .Simultaneously acquired Images of nearby areas generally have similar color tones, with very few abrupt changes in overall cliffiness.In other words, the cumulative probability distributions of contemporary NDVI images in nearby areas are similar.In light of this, a calibration model was constructed to harmonize the different time-phase NDVI images (called to-be-calibrated NDVI images) at all the scenes covering a coal base to align with the cumulative probability distribution of the reference NDVI image.Each coal base requires at least 4-9 scenes of images to be fully covered.There are 90 scenes at the 14 coal bases.Each scene has about 20 images per year.Therefore, at a minimum, about 1800 images in a year need to be processed.The construction of the calibration model includes the following steps: (1) The first step is to create a function that calculates the cumulative probabilities of both the reference NDVI image and the to-be-calibrated NDVI images.This function begins by computing the histogram of the NDVI image using the "ee.Reducer.histogram()"reducer.It then derives the Cumulative Distribution Function (CDF) of the NDVI image.Finally, the function outputs a feature set containing cumulative probability values corresponding to each NDVI image.(2) In the second step, a fitting model is constructed for the feature sets, mapping NDVI values from the to-be-calibrated NDVI images to the reference NDVI image.In this paper, the CART classifier function is used to build the fitting model on GEE.The parameter of the CART classifier function is set to "regression" to allow outputting continuous values from a standard regression.Any NDVI value "a i " in the to-be-calibrated NDVI image can find a corresponding NDVI value "b j " on the reference NDVI image so that the cumulative probabilities corresponding to "a i " and "b j " are equivalent (Fig. 4).The input data for this function includes the to-be-calibrated and the reference NDVI images.
Batch calibration and fusion.The to-be-calibrated NDVI images at all the scenes covering a coal base were looped into the algorithm of constructing calibration models to get a calibrated NDVI image collection.Each calibrated NDVI was or was close to the annual maximum NDVI.The values of the calibrated NDVI image collection at each pixel were retrieved and sorted, and the median value was selected to get the annual maximum NDVI image.The function of median fusion can effectively avoid anomalous pixels.Considering the characteristics of discontinuous spatial distribution and the large geographic extent of each coal base, the algorithm was conducted for all 14 coal bases, respectively.

Data Records
The annual maximum NDVI dataset 47 of all 14 coal bases in China from 2013 to 2022 is available to the public at 10.6084/m9.figshare.c.6933322.It includes the following 14 compressed packages of 140 GeoTIFF images: 1) The annual maximum NDVI dataset of Henan (HN) coal base in China (2013-2022).

technical Validation
Accuracy of reconstructing the annual maximum NDVI images.Experiments were conducted to assess the accuracy of the Auto-NDVI cb algorithm by assuming the absence of the annual maximum NDVI image.Since a single sensor was used as a data source, the maximum NDVI true value acquired by that sensor should be used as validation data.The nearby image that has the same imaging time as the reference NDVI image also closely represents the annual maximum NDVI.This image is regarded as the "True image", and assumed missing.Auto-NDVI cb algorithm is conducted to reconstruct the annual maximum NDVI ("Predicted image").The images used for accuracy validation are independent and high-quality data.The Root Mean Square Error (RMSE) between the "Predicted image" and the "True image" is calculated to verify the accuracy as Eq. ( 2).Here we use the example of the JB coal base in 2020 to provide a detailed illustration of the experiments.As shown in Fig. 3, the image acquired on 24 th August with PathRow = 126033 has been selected as the reference NDVI image, most closely characterizing the most vigorous vegetation growth in 2020.The image with PathRow = 126032 was also acquired on 24 th August, and also closely represented the annual maximum NDVI.Hence, the image (PathRow = 126032, 24 th August) was regarded as the "True image" and removed to simulate the absence of the annual maximum NDVI image.The Auto-NDVI cb algorithm was conducted to predict the annual maximum NDVI image based on the reference NDVI image (PathRow = 126033, 24 th August).Finally, the RMSE was calculated.In this study, the same experiments were carried out for all 14 coal bases to validate the accuracy.where n represents the number of pixels in the image; value (x y) Predicted i, i represents the NDVI value of the pixel at position x, y in the "Predicted image"; value (x y) True i, i represents the NDVI value of the pixel at position x, y in the "True image".
Table 2 shows the RMSE of the Auto-NDVI cb algorithm year by year in each coal base.The bold values indicate the maximum and minimum values of RMSE from 2013-2022 for each coal base.The algorithm produced an average RMSE of 0.087 across the 14 coal bases from 2013 to 2022.The algorithm had varying accuracy in different coal bases.It suggests that the highest accuracy is present in the YG coal base with an average RMSE of 0.062, followed by the JZ and JB coal base at 0.071 and 0.079, respectively.The average RMSEs for the JD, LH, LX, HL, SD, HN, and SB coal bases all ranged between 0.080 and 0.088.By contrast, the ND coal base displayed the lowest accuracy, with the highest average RMSE of 0.113.In terms of accuracy in specific years, the YG coal base achieved the lowest RMSE of 0.054 in 2022, while the ND coal base had the highest RMSE of 0.132 in 2015.The YG coal base belongs to the broad-leaved evergreen forest area, where the land cover is dominated by vegetation.Moreover, the vegetation grew vigorously, and the land cover between the scenes was close to each other.On the contrary, the ND coal base belongs to the plateau-continental climate area, with sparse surface vegetation and large heterogeneity of land covers between different scenes, but the RMSE of 0.132 in the ND coal base is also acceptable.The maximum inter-annual differences of RMSE vary from 0.020 to 0.045 with different coal bases, with an average of 0.031.In terms of all the coal bases, the average RMSE varies from 0.081 to 0.093 with different years.Figure 5   and Table 2 show that the accuracy of the Auto-NDVI cb algorithm fluctuates little from year to year and remains stable.Overall, the Auto-NDVI cb algorithm exhibits a satisfactory performance across the 14 coal bases.

Applicability of the Auto-NDVI cb algorithm to the 14 coal bases. The theoretical basement of
Auto-NDVI cb is Tobler's First Law of Geography.So, when the to-be-calibrated NDVI image is far away from the reference NDVI image, the accuracy of Auto-NDVI cb may be lower.Experiments were conducted to assess the applicability of the Auto-NDVI cb algorithm at the scale of coal bases and measure the accuracies of different scenes.
The JB coal base is also taken as an example to illustrate the experiments, and the accuracies of scenes 125032, 125033, 125034, and 126034 were measured.First, the image that is closest in each scene to the time of the reference NDVI image was selected as the "True image".Second, the True image of each scene was removed to simulate the absence of the annual maximum NDVI image, respectively.Third, the reconstructed annual maximum NDVI images (called "Predicted image") were produced by the Auto-NDVI cb algorithm.Finally, the RMSE between the True images and the Predicted images were calculated, respectively.Table 3 shows the accuracy of each scene covering the 14 coal bases.We define a term called "image distance".The image distance is the sum of the number of Path and Row differences between the to-be-calibrated NDVI image and the reference NDVI image as Eq. ( 3). Figure 6 shows the variation of RMSE with image distance.Among all the scenes, the lowest RMSE is scene 122035 of LH coal base, with an image distance of 1 and an RMSE of 0.063.The highest RMSE is scene 144030 of the XJ coal base, with an image distance of 4 and an RMSE of 0.140.In terms of specific coal bases, for example, in JD coal base, the image distance of scene number 125035 is 1, and the RMSE is 0.068; the image distance of scene numbers 125036 and 126035 is 2, and the RMSE is 0.082 and 0.086, respectively.All 14 coal bases show that the farther the distance between the to-be-calibrated image and the reference NDVI image, the higher the RMSE.The farthest image distance is 4, and the accuracy of the algorithm is still acceptable (0.140).

Impact of Imaging Time on Accuracy.
This experiment aims to assess the accuracy of the Auto-NDVI cb algorithm for to-be-calibrated NDVI images from different months.The experimental process is illustrated in Fig. 7 (JB coal base as an example).The nearby image that has the same imaging time as the reference NDVI image is regarded as the "True image" and removed.The to-be-calibrated image in each month is calibrated by Auto-NDVI cb to produce the "Predicted image" (e.g., the April and May in Fig. 7).If there is not only one image (usually two images) in a month, the images in this month are calibrated respectively, and is fused by "median" function in GEE to produce the "Predicted image" in this month (e.g., the June and July in Fig. 7).Finally, the RMSE between the True image and the Predicted image for each month is calculated.This experiment has been carried out for each coal base, respectively.
Figure 8 shows the variation of RMSE with months at each coal base.The RMSE showed an obvious trend of decreasing and then increasing.The months with the highest accuracy were mainly July and August.The RMSE of YG, JIZ, and HL coal bases was lowest in July, and the RMSE of SD, JZ, LH, HN, JB, MD, LX, SB, and ND coal bases was lowest in August.RMSE in XJ was lowest in June, and RMSE in JD was lowest in September.
The month with the lowest RMSE usually coincided with the month in which the maximum NDVI was recorded, i.e., the month of the reference NDVI image.For instance, in terms of the JZ coal base, the reference NDVI image was acquired on 24 th August and the lowest RMSE of 0.063 also appeared in August.Similarly, the reference NDVI image for the YG coal base was acquired on 11 th July, and the lowest RMSE was present in July with 0.061.This trend is also observed for the HL, SD, JZ, LH, HN, JB, MD, LX, SB, and JD coal bases.However, there are some exceptions.For example, the ND coal base does not display the lowest RMSE in September, the month of the reference NDVI image.This misalignment can be attributed to the date of reference NDVI image, i.e., 1 st September, which is close to August for the lowest RMSE.Although the month with the lowest RMSE in the XJ and JIZ coal bases did not exactly align with the month of the reference NDVI image, the difference between the lowest RMSE and the RMSE in the month of the reference NDVI image is small (XJ: 0.090 vs. 0.093; JIZ: 0.083 vs.0.088).These results suggest that the accuracy of the Auto-NDVI cb is related to the difference in imaging time between the to-be-calibrated NDVI image and the reference NDVI image.However, even when the difference in imaging time is the largest, for example, the to-be-calibrated NDVI image in January (YG coal base) or April (JB coal base), the largest RMSE is 0.121 (YG coal base) and 0.126 (JB coal base).It indicates that the imaging time of the to-be-calibrated NDVI image has an effect on the accuracy, but the effect is limited and acceptable.Comparison with other NDVI datasets and methods.In this experiment, the annual maximum NDVI computed by the Auto-NDVI cb algorithm was compared with NDVI images from the publicly available dataset and NDVI computed by the qualityMosaic method in GEE.The two publicly available datasets are the "32-Day NDVI Composite" 48 produced by USGS (referred to as the "USGS" dataset) and the "China annual maximum NDVI dataset (30 m, 2000-2020)" 31 (referred to as the "CNMD" dataset).The "USGS" dataset is made from Tier 1 orthorectified scenes, using the computed top-of-atmosphere (TOA) reflectance.These composites are created from all the scenes in each 32 days beginning from the first day of the year and continuing to the 352nd day of the year.The last composite of the year, beginning on day 353, will overlap the first composite of the following year by 20 days.All the images from every 32 days are included in the composite, with the most recent pixel as the composite value, which is publicly available on GEE.In this study, we extracted the annual maximum NDVI at the pixel scale.The "CNMD" dataset utilizes all Landsat5/7/8/9 remote sensing data throughout the year to obtain the maximum value of NDVI in one year for each pixel from 2000 to 2020 using series data preprocessing and data smoothing.The dataset is produced based on the GEE cloud computing platform and publicly released in the National Ecosystem Science Data Center.The dataset has a spatial resolution of 30 m and a temporal resolution of years.
Figure 9 shows the annual maximum NDVI results for the YG, MD, SB, and LX coal base from the "USGS" dataset, the qualityMosaic method, the "CNMD" dataset, and the Auto-NDVI cb algorithm.In Fig. 9(a,b), the mosaic lines were noticeable.For the "USGS" dataset, the TOA data was used for calculating NDVI, which lacked atmospheric correction.Hence, the NDVI values in Fig. 9(a) were significantly lower than others.The qualityMosaic method can calculate NDVI using Landsat SR data, but it usually suffers from the problem of missing data during the most vigorous vegetation growth period due to the satellite revisiting capability and cloud cover.The missing data were replaced by data at other times in the qualityMosaic method.Consequently, the annual maximum NDVI images exhibited distinct mosaic lines, and some areas had considerably lower NDVI values compared to the real annual maximum NDVI values.Figure 9(c) displays the annual maximum NDVI results from the "CNMD" dataset.While this dataset shows slight traces of mosaic lines in the visualization, the overall NDVI values were lower.To address the problems of missing data during the peak of vegetation growth, time-series images were interpolated and subjected to SG filtering.However, this approach could not fully recover the annual maximum NDVI, leading to lower NDVI values overall.Figure 9(d) shows the annual maximum NDVI results generated by the Auto-NDVI cb algorithm.These results do not show mosaic lines in the visualization, and the values are closer to the real annual maximum NDVI values.In comparison, the Auto-NDVI cb algorithm is capable of producing annual maximum NDVI with better temporal consistency at the scale of coal bases.
We conducted a detailed comparison of annual maximum NDVI data in time series from different datasets and methods using a sample area within the YG coal base, as shown in Fig. 10.The USGS dataset is available until 2021.However, its NDVI values were significantly lower than those of the other three datasets and exhibited noticeable mosaic lines.The qualityMosaic method resulted in an NDVI that highlights a distinct mosaic line for the years 2018 and 2019.The CNMD dataset is available until 2020.In terms of visualization, it displays horizontal stripes in NDVI images.Moreover, the NDVI values for 2019 and 2020 were abnormally lower in comparison to the first six years.
To further compare different NDVI datasets, two sample points were randomly chosen and the NDVI time-series plots for the "USGS" dataset, qualityMosaic method, "CNMD" dataset, and Auto-NDVI cb algorithm were plotted (Fig. 11).Combined with Fig. 10, it can be seen that Fig. 11(c) "QM" data in 2018 NDVI values plummeted may not attribute to vegetation damage, but the absence of remote sensing image at the moment of the most vigorous growth of vegetation.The results suggest that the Auto-NDVI cb algorithm maintained consistently stable annual maximum NDVI values, even during years when abnormal NDVI values were produced by the "CNMD" dataset and qualityMosaic method.Furthermore, the changing pattern of NDVI generated by the Auto-NDVI cb algorithm over years is consistent with that of the NDVI from the "CNMD" dataset and the quali-tyMosaic method.Consequently, these results demonstrate that the Auto-NDVI cb algorithm reliably delivers the annual maximum NDVI and displays better consistency over distinct years.(1) Auto-NDVI cb is only based on Landsat as the data source and does not require data from other satellite sensors (e.g., MODIS) to fuse and interpolate the missing NDVI data with different spatial resolutions.Sensors such as MODIS or AHI provide data that are suitable for spatially homogeneous vegetation over a range of several hundred meters.However, few scenarios meet this vegetation requirement, especially in mining areas.In this paper, we would like to provide an annual maximum NDVI with higher resolution and data quality to facilitate a more accurate analysis of the ecological changes in the mining area.Sensor fusion may bring some new problems.The NDVI calculated by different sensors is inconsistent due to spatial 49 , temporal 50 , radiometric, and spectral factors.The single data source reduces the systematic bias caused by heterogeneous sensors as well as the loss of data detail information caused by low resolution 50,51 .The Auto-NDVI cb algorithm proposed in this paper avoids these problems by using a single sensor and finishing the task of generating annual maximum NDVI with satisfying accuracy.Future of the auto-NDVI cb algorithm.While the Auto-NDVI cb algorithm has achieved satisfactory accuracy in producing the annual maximum NDVI of the 14 coal bases in China, it still has some limitations to address in the future.First, the algorithm was based on Tobler's First Law of Geography and the consistency of the cumulative distribution probability function of similar features.However, it's important to recognize that features from different scenes are not entirely the same.Consequently, the accuracy exhibited variations with different coal bases.The algorithm performs well when there is a high level of land cover consistency or when the coal base is relatively small.As the distance between the reference NDVI image and the to-be-calibrated NDVI image increases, the accuracy of the algorithm decreases.The accuracy with image distance >4 cannot be confirmed.While this Fig. 10 The annual maximum NDVI data in time series from different datasets and methods.study demonstrated the effectiveness of Auto-NDVI cb at the coal-base scale, further research is needed at a larger spatial scale to explore the improvement of Auto-NDVI cb in producing the NDVI across the whole of China and even the world.
Second, the accuracy of the annual maximum NDVI was closely related to the quality of the reference NDVI image.In an extreme case, the accuracy of the annual maximum NDVI may not be guaranteed when Landsat images are missing so severely that there is not an available image for the entire study area during periods of vigorous vegetation.Although such a case did not occur within the 14 coal bases examined in the past ten years, it remains a theoretical possibility when applying the algorithm to other regions.

Fig. 1
Fig.1The location of 14 coal bases in China.

Fig. 2
Fig. 2 Procedure of the Auto-NDVI cb algorithm.

Fig. 3
Fig. 3 Coverage of Landsat scenes at JB coal base as an example.

Fig. 4
Fig.4 The calibration model (The NDVI values have been scaled up by a factor of 1,000).
displays the distribution of RMSE for 14 coal bases from 2013 to 2022.Outliers were only observed in the LH and JD coal bases, exhibiting RMSE values of 0.103 and 0.108 correspondingly.Taken together, Fig.5

Fig. 6
Fig.6 The variation of RMSE with image distance.(a-n) for 14 coal bases.

Fig. 7
Fig. 7 Calibration of NDVI images month by month (JB coal base as an example).

Fig. 8
Fig. 8 RMSE for different months in 14 coal bases.The cyan bar denotes the month of the reference NDVI image.

( 2 )
Auto-NDVI cb can be automatically and completely performed in GEE, including the image filtering, cloud removal, NDVI calculation, selection of the reference NDVI image, construction of the calibration model, batch calibration, and the last fusion to generate annual maximum NDVI.While the calibration of a single image can be achieved using software like The Environment for Visualizing Images (ENVI) or ERDAS IMAGINE, it becomes cumbersome when dealing with study areas covered by multiple satellite scenes and multiple time-phase images.For instance, the YG coal base required data from at least 7 scenes, totalling around 168 images per year for a single coal base.When the research was extended to 14 coal bases and carried out for more than ten years, it was impractical to download the images, then filter the reference NDVI images from many images, and calibrate a large number of NDVI images.

Fig. 11
Fig. 11 The time series plots of NDVI for different datasets.(a) The true-color composite of remote sensing images on the sample area; (b) Mean values of NDVI in the sample area; (c) Time-series NDVI at point 1; (d) Time-series NDVI at point 2.

Table 1 .
Publicly Available NDVI datasets.AVHRR: Advanced Very High Resolution Radiometer NOAA:

Table 2 .
RMSE of the Auto-NDVI cb algorithm in 14 coal bases from 2013 to 2022.

Table 3 .
Accuracy of each scene of 14 coal bases.